Biostatistics For Dummies (Monika Wahi John Pezzullo)

groups, in R statistical software, only the Tukey-Kramer test is available, and not Tukey’s HSD test

(as demonstrated later in this chapter in the section “Executing and interpreting post-hoc t tests”).

Scheffe’s test compares all pairs of groups, but also lets you bundle certain groups together if

doing so makes physical sense. For example, if you have two treatment groups and a control group

(such as Drug A, Drug B, and Control), you may want to determine whether either drug is different

from the control. In other words, you may want to test Drug A and Drug B as one group against the

control group, in which case you use Scheffe’s test. Scheffe’s test is the safest to use if you are

worried your analysis may be suffering from Type I error because it is the most conservative. On

the other hand, it is less powerful than the other tests, meaning it will miss a real difference in your

data more often than the other tests.

Running an ANOVA

Running a one-way ANOVA in R is similar to running an independent t test (see the earlier section

“Executing a t test”). However, in this case, we save the results as an object, and then run R code on

that object to get the output of our results.

Let’s turn back to the NHANES data. First, we need to prepare our grouping variable, which is the

three-level variable MARITAL (where 1 = married, 2 = never married, and 3 = all over marital

statuses). Next, we identify our dependent variable, which is our fasting glucose variable called

LBXGLU. Finally, we employ the aov command to run the ANOVA in R, and save the results in an

object called GLUCOSE_aov. We use the following code: GLUCOSE_aov <- aov(LBXGLU ~

as.factor(MARITAL), data = NHANES). (The reason we have to use the as.factor command on the

MARITAL variable is to make R handle it as an ordinal variable in the calculation, not a numeric one.)

Next, we can get our output by running a summary command on this object using this code:

summary(GLUCOSE_aov).

Interpreting the output of an ANOVA

We describe the R output here, but output from other statistical packages will have similar information.

The output begins with the variance table (or simply the ANOVA table). You can tell it is a table

because it looks like it has a column with no heading followed by columns with the following

headings: Df (for df), Sum Sq (for the sum of squares), Mean Sq (mean square), F value (value of F

statistic), and Pr(>F) (p value for the F test). You may recall that in order for an ANOVA test to be

statistically significant at α = 0.05, the p value on the F must be < 0.05. It is easy to identify that F =

12.59 on the output because it is labeled F value. But the p value on the F is labeled Pr(>F), and that’s

not very obvious. As you saw before, the p value is in scientific notation, but resolves to 0.00000353,

which is < 0.05, so it is statistically significant.

If you use R for this, you will notice that at the bottom of the output it says Signif. codes: 0

‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1. This is R explaining its coding system for p values. It

means that if a p value in output is followed by three asterisks, this is a code for < 0.001. Two

asterisks is a code for p < 0.01, and one asterisk indicates p < 0.05. A period indicates p < 0.1,

and no notation indicates the p value is greater than or equal to 0.1 — meaning by most standards,

it is not statistically significant at all. Other statistical packages often use similar coding to make

it easy for analysts to pick out statistically significant p values in the output.